blogdowntidyverse, janitor, sf, here, blogdownReminders:
library(package_name) to attach an installed packageinstall.packages("package_name") in the Consolelibrary(tidyverse)
library(janitor)
library(here)
library(plotly)
library(gghighlight)
library(sf)
library(blogdown)
Data: Prison populations in the United States from The Vera Institute
Reminders:
R, or using the shortcut Command + Shift + Mhere() to navigate to folders not in your top-level working directory (discuss: why is this important?)us_prison <- read_csv(here("data","incarceration_trends.csv"))
Always look at your data.
Every time.
Every. Single. Time.
Here are some useful functions for data exploration:
View() - or alternatively, click on object in ‘Environment’ tabsummary()head()Let’s use them to check out the us_prison object we’ve just stored:
# View(us_prison)
# summary(us_prison)
# head(us_prison)
Familiarize yourself with the data. Note that there are total populations and population breakdowns by sex and race for each county, as well as jail and prison populations for each county by sex and race.
In this review section, we’ll only explore the proportion of imprisoned people who are black California prisons over time.
Reminders:
The steps we’ll use here:
dplyr::select() to choose which columns to keep (unnecessary, but to remind ourselves):
dplyr::filter() to only keep observations from Californiatidyr::drop_na() to remove any rows where the prison populations were not reporteddplyr::group_by() + dplyr::summarize() to calculate the totals each year for the entire statedplyr::ungroup() to get rid of any groupingdplyr::mutate() to add a column that is the proportion of imprisoned people who are black each yearHere is what that sequence looks like using the pipe operator:
ca_prison_prop_bl <- us_prison %>%
select(year, state, county_name, total_prison_pop, black_prison_pop) %>%
filter(state == "CA") %>%
drop_na(total_prison_pop, black_prison_pop) %>%
group_by(year) %>%
summarize(
tot_pris_pop = sum(total_prison_pop),
pris_pop_black = sum(black_prison_pop)
) %>%
ungroup() %>%
mutate(prop_black = pris_pop_black / tot_pris_pop)
Let’s refresh our data viz skills with ggplot2 by creating a graph of the proportion of imprisoned people in California who are black from 1983 - 2015:
ggplot(data = ca_prison_prop_bl, aes(x = year, y = prop_black)) +
geom_line() +
scale_y_continuous(limits = c(0, 0.40)) +
theme_minimal() +
labs(x = "year",
y = "Proportion black (/ California total imprisoned")
The glory of reproducible code! I can copy the code from above, EXCEPT:
state == "CA"year AND stateus_prison_prop_bl <- us_prison %>%
select(year, state, county_name, total_prison_pop, black_prison_pop) %>%
drop_na(total_prison_pop, black_prison_pop) %>%
group_by(year, state) %>%
summarize(
tot_pris_pop = sum(total_prison_pop),
pris_pop_black = sum(black_prison_pop)
) %>%
ungroup() %>%
mutate(prop_black = pris_pop_black / tot_pris_pop)
And let’s make a plot of all 50 (or try to):
ggplot(data = us_prison_prop_bl, aes(x = year, y = prop_black)) +
geom_line()
Yuck! What’s happening there?
ggplot has no idea that there is a variable for ‘state’ that we’d want to group by. We can do that a number of ways, but one is to change an aesthetic (like line color) based on the grouping variable:
state_graph <- ggplot(data = us_prison_prop_bl, aes(x = year, y = prop_black)) +
geom_line(aes(color = state)) +
theme_minimal() +
labs(x = "Year",
y = "Proportion of state prisoners who are black")
state_graph
That’s pretty hard to digest (also, whoa). Some other ways we can break things down:
Interactive graphs with plotly:
ggplotly(state_graph)
What if I want to just highlight a single state of interest?
Then I could use gghighlight:
state_graph +
gghighlight(state == "TX" | state == "CA")
## Warning: You set use_group_by = TRUE, but grouped calculation failed.
## Falling back to ungrouped filter operation...
## label_key: state
First, wrangle object us_prison_prop_bl to just get observations from 2010:
prop_bl_2010 <- us_prison_prop_bl %>%
filter(year == 2010)
A ggplot first:
Note: the fct_reorder() here will make them show up in meaningful order, not in the default alphabetical order for character data.
ggplot(data = prop_bl_2010, aes(x = fct_reorder(state, prop_black), y = prop_black)) +
geom_col(aes(fill = prop_black)) +
theme_minimal() +
labs(x = "State abbreviation",
y = "Proportion of imprisoned people who are black\n(2010 data only)") +
coord_flip()
And that’s what we want to show on a map of the United States.
Get the US states data:
states <- read_sf(dsn = here("data","us_spatial"), layer = "states")
Use plot to look at it quickly:
plot(states)
And see what the sf object actually looks like (hint: it looks like a regular data frame, but geometries are sticky).
View(states)
Aha. Now we want to merge the spatial data with the prison attributes:
prison_spatial <- states %>%
left_join(prop_bl_2010, by = c("STATE_ABBR" = "state"))
Then look at prison_spatial - notice that whichever states had data for 2015 now show up with the aligned spatial information!
Finally, let’s make a map of it:
ggplot() +
geom_sf(data = prison_spatial,
aes(fill = prop_black),
size = 0.2) +
scale_fill_gradient(low = "yellow", high = "red") +
theme_minimal()
sf objects?They’re so cool because sticky geometries means that you get to wrangle as you would with a normal data frame, but the spatial information is retained!
Example: From prison_spatial, filter to only include CA, OR and WA. Make a chloropleth based on the total prison population (tot_pris_pop) for the three states.
west_coast_prison <- prison_spatial %>%
filter(STATE_ABBR %in% c("CA", "OR", "WA"))
ggplot(data = west_coast_prison) +
geom_sf(aes(fill = tot_pris_pop))